Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features
نویسندگان
چکیده
In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR) system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.
منابع مشابه
Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text w...
متن کاملScript Identification from Printed Document Images Using Statistical Features
Automatic identification of a script in a document image facilitates many important applications such as automatic archiving of multilingual documents; searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this work a technique for script identification from document images is proposed. The method uses vertical and horizontal...
متن کاملScript Identification from Trilingual Documents using Profile Based Features
In a multi script environment, majority of the documents may contain text information printed in more than one script/language. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, it is proposed to develop a model to identify the script type of a trilingual document printed i...
متن کاملScript Identification of Text Words from a Tri Lingual Document Using Voting Technique
In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, H...
متن کاملNative Language Identification Based on English Accent
Present work is aimed at investigating the influence of mother tongue (L1) of a South Indian speaker on a second language (L2). Second language can be a dominant local language, national language in India i.e., Hindi or a connecting language English. In the current study, L2 is a short discourse in English. Cepstral and prosodic features were used as in Language Identification (LID) to distingu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Computational Intelligence Systems
دوره 1 شماره
صفحات -
تاریخ انتشار 2008